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AUTOREGRESSIVE SPECTRAL ESTIMATION. L*M. 
SPECTRAL smoothing. AND ENTROPY* 



Eaanuel Parzen 
Institute of Statistics 
Texas AAN University 
College Station, TX 77H4i 


Abstract 

Spectral estimation Is motivated by information 
divergence distance. Two methods of spectral estima¬ 
tion are developed In this paper: autoregressive 
spectral estlsMtlon (section 3) and log spectral 
kernel eatlaatlon (section 4). They are motivated as 
parametric and non-paraaetrlc estimators which mini¬ 
mize ’’entropy" or "information divergence" distances 
between raw and fitted spectral densities. The role 
of entropy concepts In the statistical estimation of 
spectral densities (section 2) is explained by con¬ 
trasting it with the role of entropy concepts in pro¬ 
bability density estimation (section 1). Adaptive 
procedures for forming, and combining, these estima¬ 
tors for an observed time series are provided by 
order-determining and truncation (half-power) point 
determining criteria, which are described. 

i. The role of entropy coocepts In statistical 
cion of probability density functions. 

Let X be a continuous random variable, and 

X,.X a random sample of X (consisting of indepen- 

I n 

dent random variables Identically distributed as X). 
The distribution function F(xJ and the proba¬ 

bility density function f(x),-«<x<», are defined by 

F(x)«Pr(X<x). f(x)«P'(x) . 

The entropy (or Shannon Information) of X le 
denoted by H(f) and la defined by 

H(f)- £|-log f(x) f(x)dx 
-Ejdog f(X)l . 

All observed distributions are eseuswd to have finite 
entropy. 

A maximum entropy denalty is a probability den¬ 
sity f(x) determined by maximizing H(f) over all f 
saciafylng certain constraints (usually Involving 
moments of f). 

Theorem LA : Three Important densities, and their 
characterization by a maxlmimi entropy principle are: 
(1) uniform distribution over an interval a to b 
maximizes H(f) over the constraint that f is non-zero 
only on the interval a to b; (2) exponential diatrl- 
bucion with mean u maximizes H(f) over the constraint 
that f is non-zero only for x>0, and has mean u : (3) 

normal distribution with mean u and variance maxi¬ 
mizes H(f) over tlie constraint that f has mean u and 
variance 

The maximta entropy principle is a probability 
modeling principle in tha foregoing examples. It 
becomes a statistical estlsution principle (which fits 
distributions to data) when the (onscraints on f are 
expressed in terms of sa^)le means and vsrlsnces; it 
la then similar to tha method of momenta. A maxlmvn 
entropy denaicy estimator f can be expressed In 
symbols:. 

H(f)-max H(f) 
f 

where f Is constrained to have certain moments equal 
to the corresponding simple moments. 

An slcemativc (and, we believe, more general) 
statistical estimation principle is provided by the 

This r^earch was supportad by the Office of Naval 
.Research (Contract NOOOU-ai-MP-10001, ARO 0AAC29- 
RO-C0070). 


cross-entropy H(g;f) and l oformatIon divergence I(g;f) 
of c%fo probability Jenwlcy functions f(x) and g(x>. 
Define 

H(g:f) ■ / t-log g(x)}f(x}dx 
-E^I-log g(X)J; 

i(g:f) • r fl^^^^*^*** 

*E [-log ^1 
f(X)' 

Note chat H(f)*H(f;f). Another nasm for information 
divergence is Kullback-I.lebler information number. A 
minimum information divergence density g is an approxi¬ 
mator CO a specified density f determined bv 

I(g:f> • min I(g;f) 

g 

where g Is constrained to belong to a specified para¬ 
metric family of probability denticles. 

Theorem IB : Three important examples of minimum 
information divergence approximators or estimators 
are: (1) f is assumed to be positive only over Che 
Interval a to b, and g is any uniform distribution; 

Chen g is Che uniform distribution over s to b; (2) 
f Is positive only for x^s, and has a finite mean u, 
and g is Che two parameter exponential distribution; 
then g is the exponential distribution with mean u and 
domain (a,«); (3) f has finite mean u and variance 
and g Is any normal distribution; then g is 

N(j.(j2). 

Theorem IB may have been first explicitly formula¬ 
ted by Thiel (1981), although it Is implicitly known 
through the equivalence of maximum likelihood estima¬ 
tion with minimum information divergence estimation. 

An Important observation by Thiel is that Theorem IB 
can be used Co prove Theorem lA, and thus avoid the use 
of the calculus of variations. We extend this observa¬ 
tion to spectral density estimation In section 2. 

A minimum Information divergence density g can be 
expressed as a mlnlDun cross-entropy density: 

H(g;E) - min H(g;f) 

B 

A cross-entropy can be defined for an arbitrary 
(Including discrete) distribution function F(x) by 

H(k;F) • /* (-log g(x)ldF(x) 

Therefore a minimum Information divergence density g 
can be defined for an arbitrary distribution function 
F by 

H(g;F) • Bin H(g;F) 

g 

where g Is constrained to belong to a specified para¬ 
metric family of probability denaltles. Theorem IB is 
true for titls definition. 

Consider now a finite sample X^,...,X|| and a 
parametric model f»f(x) for the true probability density 
f(x>. Indexed by parameters ( which one would like to 
estimate from the sample. A maximum likelihood estl- 
mator of 9 Is defined as the parameter values t) maxi¬ 
mizing . 

L <'<) - • log frtfX,.X ) 

n n 9 I n 

n 


Let F(x>, <’«>• X' ” , denote the sample dlscrtbutlun 
defined by 


1 








F'(x) ■ fraction of X.•' • )« . 

I n - 

•)n« cait express 

l._^(0) - r log f,^(x) dF(x) 

- - . 

Therefore oexioiua likelihood parSAeter estlAacors ^ 
yield lelnlaui Inforautlon divergence densities 

Ry Introducing ^ to denote the (eyabollc) sae«>le 
probability density of the saaple. one can regard 4 
as satisfying 

nfgjf) - Ain Ufgjf) - 

A re'Interpretation of nexlteua likelihood la obtained 
by revrlting the Informetlon divergence In terae of 
quantile functions whose role In statistical infer' 
ence Is eaphaalzed by Parzen (1979). 

Introduce the sanple quantile function 

q(u) - r^(u) . 


S<W, . t e-^"“"R(v) 

v»-» 

... a - 2 xiwv , 
t(w) • e o(v) 

The spectral distribution function la defined by 
w 

F(w> • / f(w') dw* , 0 £ w 1 . 

0 

When p(v) la not aaauaed to be suABUble. there always 
enlace a spectral dlatributioo function ?(w>« O^w^l, 
such that ^ 

d(v) • / e^’^*“'^dF{w) 

0 

When o(v) la suaftable, it has the spectral repreaenta- 
tion ^ 

i>(v) • / (w)dw . 

0 


Its derivative q(u) * Q*(u) saclafiea 
q(u) f(Q(u)) - 1 . 


Define 


d.(u) 


aQ(u)) 




where F^(x) Is the dlatrlbutlon function with density 
f,,(x)> Hake Che change of variable u • ^(x)» 

X « ()(u) CO obcaln 

r(fg:f) - r -lo* 

which one can Interpret as a aeaaure of how close to 
a unlfora density la The full conaequances 

of this Interpretatlon^are explored elsewhere. 

It should be noted that one can define other 
■eaaurea to nlnlalze to fora parastccer eeclaatora: 
exeaples are 

/g (d^(u)-ll^(Ju . 

whose alnlalzetlon leeda to ''aodifled chi-square'* 
estlaatora, and 


1 

/g (Fg((5('i))-ul^du . 

Whose ainlalzatlon leeda to "alnlau distance estl' 
natora". 

InfocMtlon divergence Is the aeaeure chat aoat 
readily gancrallzca to stochaatlc processes. 

It should be noted chat only the paraaatar 
•■riMtinn probLea la efficiently solved by olnialzlng 
I(f 0 ;f). The problea of goodness ot fit la 
solved by considering the site of the difference froa 
Che unlfora distribution 0(u) • u of 0^(u)*p 0 (Q(u)) 
for 9 " The acMial idantlflcatlon problea la to 
find distribution functions t such that ^(<)(u)) la 
parslAonloualy not slgnlficaocly different froa the 
unlfora distribution 0 (u) • u. 


2. Tha rola of ancropy concaota In acatlatlcel eatl - 
aetlon of spectral density functloos . 

Let Y(c), C • 1, ...» T be a aeaple of a Caua- 
•laa zero aaan stationary claa sarlaa with covariance 
function 

*(v) - ElY(t)Y(t+v>l, V - 0. ♦•1. +2. 

and correlation fiaictlon 

dW) • “ Corr|Y(t) ,Y{ffv) ] . 

We aaauae R(v) and p(v) are abeolutely smseble. 
and define the power apectr\ai S(w). 0 ^ ^ l, and 

the aoectrel density f(w). 0 ^ v ^ It by 


A stationary Gaussian claa sarlca la callad 
Wilts nolee If 

o(v) • 0, V > 0 ; 

f(w) - 1» 0 < w < 1 ; 

F(w) • w» 0 < w ^ 1 . 

A stationary Gaussian tlae aerlas with aiaawbla 
correlation function and intagrabla log spectral dan- 
slcy can be represented In ceraa of a white noise tlae 
series e(t) representing the Innovationa [prediction 
errors Y'^(t) • Y(t) - Y“(t) of the Infinite aeaory one- 
step ahead pradletor Y^(e) of Y(C)]. The AR(w)» or 
Infinite order aucoregreeelve repreeatitaclon, it 

Y(t)+a^(l)Y(c-l) ...+a^CD)Y(t-n)+...» e(t) . 

Tha or Infinite order aovlog repreacntatloa, is 

Y(t)-E(c)-H)^(l)c(c-l)-H)^(2)e(C-2)+... 

A finite peraattcer representation Is an ARMA(p,q) of 
the fora 

Y(t)+e (l)Y(t-l)+...+a (p)Y(t-p) 

P P 

• €(t)+b (l)c(t-l)4-.. .+b (q)e(c-q) 

9 9 

The filter relating Y(c) end e(C) la celled a 
whitening filter . Peraaeter eatiaetion la the theory 
of eeciaeclon of the peraaatera of the whitening filter 
and aodel Identification la tha theory of eeciaeclon of 
the atruccurel fora of the whitening filter. To deve¬ 
lop epproechee to pereaeter eeciaeclon for e randoa 
aeaple* in section 1 we defined the following concepts: 

Entropy H(f) , 

Maxlaun entropy density f . 

Croaa-antropy H(g;f) , 

Information divergence I(g;£), 

Hlnlaua inforaaclon divergence density, 

Hinlaua cross-entropy density. 

Likelihood of a aeaple . 

Nexiaiai likelihood pereaecer esclaetor. 

To develop approechee to eatiaetion of the pere- 
aecera 6 of a pereaatrlc aodel f^Cw) of the spectral 
density f{w} of a stationary zero aaan Geuaalan claa 
aerlas Y<c), we develop eaalogues of the foregoing 
concepts. We start with an epproxlaate formula for tha 
likelihood function 

40 ) - f lot {,(x(i).ta)) 

of the tlae series aaaple. We esaiae that Y(c) has 
bean divided by (R(0)}H so chat it can be considered 
Co have variance 1, and Its covariance function equeia 
its correlation function. 


2 



Th« first step in analyzing a time series should 
be to ooDpuca the sample correlation function 
T-v T 

i(v) - I Y(t)Y(f*^) 4 : Y‘(t) 

t-l t-1 

and the sample spectral density 

f(w) . : : * : Y2<t) 

t"l t"i 


- -2trlwv 
i. e 

! v| <T 


o(v) . 


It should be emphasized that in practice one 
should consider using a "data window" to compute f(w). 
for w • k/’Q, k ■ 0,1,.,.,Q-1, by 

f(w) - iii(w)|2 ♦ ^ 

^ k-0 ^ • 

T 

4 »(w) ■ I Y(t)K(~)exp(-2wlwt) 
t-l ^ 


where here f denotes the true prubabilitv d>*naiiy of 
the sample, and is a toodel for f. It should be 
noted that we are using the notation f and with a 
variety of meanings. For a Gaussian zero mean station¬ 
ary time series, the probability density of the sample 
is specified by Che spectral densities, f(w) of the 
true distribution and fi^fw) of the model. We continue 
to denote Che information divergence by (fQ;f) but 
now f indlcaten a spectral density rather chan a prob¬ 
ability density. Plnskar (1963) proves a formula for 
1^ (f^;f) In the limit as T <■ - : 

lia (fg;f) - I(f9;f) 


where l{f^;f) is the Information divergence defined as 
follows. 

For two spectral densities f and g, the Informe - 
tlon divergence l(g;f), cross-entropy H(g:f), and 
entropy H(f> are defined: 




/ 


0 


, f(w> 

■g(w) 


log 


f(w) _ 
g(w) 


1 ) 


dw 


tor e suitable kernel K(x) (properties of windows are 
diecuaeed in Harris (1978)). In addition for statis¬ 
tical stability oae should then slightly aonoth f(w): 
(1) compute Che sample correlation function by 

1 k - k 

D<V) " Q z exp(2ffi§v) f(^) , 

^ k-0 ^ ^ 

which holds for 0 ^ v ^ Q-T (and therefore one may 
want to chooae Q > 2T): (2) coispuce a slightly 

smooched sample spectral density by 

f(w) * I exp(-2’Ttwv) k(“)o(v) 

|vl<T ” 

where M > T/2 and k(u) Is a suitable kernel, such as 
the Parzen lag window: 

k(u) • I - 6u^ + bjul^ . [u| ' 0.3 . 

- 2(1 - iul)3 , 0.5 1 Iu| < I 

- 0 , |ul > 1 . 

Back at Che likelihood ranch, one may show that 
approxlaiacely 

-1^(6) - i log 2w + Hdjif) 

where ^ 

H(f :f) • j [ (log f (w) + 1-^^ fdw 
A 0 

This formula for likelihood shows chat the sam¬ 
ple spectral density f(w) is a sufficient statistic 
for a time series. However, it Is a very wlggly func¬ 
tion and by itself is not a consistent estimator of 
f(w). Estimators l(w) of f(w) can be regarded as 
"smoothings" of f(w), but the basic problem is how 
much to smooch. 

Another aspect of Che likelihood formula is Its 
Juaclflcaclon as an approximation. To those misguided 
anelyste for whom maximum likelihood provides the 
ultimate estlmetor for which no axpenoe should be 
spared, there la no substlcutc for chs exact llkall- 
hood (which of course Is exact only if the model being 
assumed le exactly true). Information concepts enter 
eeclmatlon theory when one recognizes that maximum 
likelihood eeclmatlon is a tachnical devlea for 
carrying out mlnlrntmi Information divergence estima¬ 
tion. The information diverganca for a sample Y(c), 
c - 1, .... T, le defined in general by 

fg(Y(l).Y(T)) 

f(Y(l). ' 


-H(g;f) - H(f;f) , 

H(g;f)-| / (log g(w) + dw . 

H(f)-H(f;f)-4 / flog f(w) +1) dw . 

* 0 

Since u - log u - I 0 for ail u, t has two of the 
properties of a distance: l(g:f) ^ 0 , l(f;f) - 0 . 
However I does not satisfy the triangle inequality. 

The Information divergence can be related to Che 
L 2 log spectral density distance 

L^L(f,g> • / Uog f(w) - log g(w)}^ dw , 

0 

using the fact that u • expdog u) • 1 4 log u 4 (<f) 
(log u)* . When f and g are "neighbors" In the sense 
chat their ratio approximates 1, 

t(g;f) - j LjL(f,g) ; 

then minimizing 1 is equivalent to minimizing L.L. 

An extensive discussion of these distances Is given by 
Gray, Buzo, Cray, and Hatsuyams (1980). 

The concepts have now been defined to state some 
of the basic facts of parameter estimation theory. 

Mexlmia likelihood estlaetors 9 are etjulvelent 
to sample mlnloium cross-entropy estimators 9 defined 
by 

H(f.;f) • mtn H(f.;f) 

9 ,) 0 

They tan be regarded as estimators of the population 
minlmm crose-entropy "parameters" 0*defined by 

where f Is the true spectral density. 

A maximtai entropy spectral denelty f is defined 
by 

H(f) • max H(f) 
f 

where f is constrained to satisfy a set of constraints 
of Che form 
I 

/ <W> e(w) dw - C . j • l.M, 

0 ^ ’ 

for M specified funccione k.(w) and conet.ints C^. 

When the constraints are of the lorm 

/ e^’'*“*t(w) dw - ..(J), J - 0. 4 1. 4 m 

0 


^jflog 


Y(T)) 








% 




( 


1C Ljjn ht* shown chat f(w> che autoregressive spec* 
tr.-il •Jenslcy f (w) derined as fulluwn: 

ID 


■. , 2nlw., - 

<e )| 


where 


(m)2® , 

01 tB ■ 

Che autoregressive coefficients a (l),...,a (■) 
satisfy normal equations (called '\ule-WaLkSr 
equations) 

a 

y. a (k)o(k-J) • 0, 1*1 .a , 

k-0 ® 


where a^(0) 1 ; and 



m 

i; " : a.(k) fl(k) 

" k-O ” 


It should be noted chat from a sequence P(v). 
i ••• quickly compute fg|(w) for all 

successive values of a • 1,2,... using a variety of 
faat algorithms [see Kailach (1974)1. In practice 
chv problem is to determine "optimal" values of m. 

Some Important properties of f (w) are: 

1 ■ 

(U 

/ f (w)dw • o(J), j • 0, + 1. .. 

0 ® 

.a ; 

(2) 

dw . 1 ; 

1 


(3) 

i 

/ Log £ (w) dw • log 0 ^ 

0 " “ 


(4) 

H(f ;f) ■ H(f ) • 4 < log 0 - + U ; 
m a 2 " m 

1 


(5) 



“ min / [ , .+x: (w) dw ; 

..c 0 ^ ■ 


(6) g^fz)hasall tcs roots In the complex plane out-> 
side Che unit circle; 

(7) ^IM log • iog jJ. 

loq • / log f(wj dw ; 

0 

^ differentiable (the 

race of ronvergence of f (w) depends on the rate 
of convergence of to o* ); 

(lOi 21(f^;fJ ■ Log - log o* • 0 as a * “ 

The foregoing facts expLsin why autoregressive 
spectral approximations* introduced in Perzco (1968), 
(1969), provide powerful, and natural, estimators of 
an unknown spectral denalty. They are generated by 
the "maxlmui entropy approach" Introduced by Burg 
(1967). However the Burg algorithm dose not compute 
Che autoregressive coefficients the Innova' 

tlon variances <j^ by the YulS'^Walker equations. 

Indeed It dome not compute either p(v) or ^(w). U 
does not provide Insight into how to identify 
"optlmel" autoregressive orders a. 

•)ne approach to defining crltarle for an optimal 
order a la to examine how well one hae transformed to 
white nolae Che residual earics 


whose spectral density is given by 

f (w» • I« (e' I ^ f (w) 


f (w) 


A "model Identification" determined order m haa Che 
property that the spectral distribution function 

? (u)- /“T (w') dw' . 0 < w < 1 . 

" I)" - ' 

la parsimoniously not significantly different from the 
uniform distribution F()(w) ■ w representing the spec- 
tral distribution funccio-^ of white noise. 

In deriving autore resslve spsccral eaclmacors 
or approximators, we ha\ so far developed an analog 
of TheorM lA, by stating that autoragraasion provides 
aeximum entropy estimators siAjecc to Che conacralnt 
that certain correlation values are attained. We 
prefer an analog of Theorem IB, which acacee chat: 

Che minimum information divergence paramater astlma - 
to'ra of air^utoragreaalvm mojal for tha true spectral 
denalty are provided by Che coefflclants which satlsly 
cKe Yule-tfalker equations. 

A very laport^c Tact (that may not ba widaly 
known) Is that the maxlmuB entropy propertlaa of auCo*> 
regressive spectral denaltlas follow from their •lal‘> 
mum information divergence properties, using Che fact 
chat 

I(f^;f) - H(f„> - H(f) > 0 ; 

m ID “ 

consequently H(f) < K(fQ). Since t and fg| satisfy 
the constraint that their first m correlations squsl 
specified values, the entropy H(f) achieves Its mjucl'- 

mum value at f • f . 

a 

3* Autoregressive apectral tstlaa tors and order 
detarminiog criteria . 

Given a sample Y(c), c •• 1, .... T, chert are 
many approaches for forming autoregressive spectral 
estimators, because (as summarized in Parztn (1901)] 
there are four equivalent ways of parametrizing them: 
(A) autoregressive coefficients, (B) correlations, 
(C) partial correlstlona, and (D) Innovation varl- 
ancas. Here we only consider starting with the sample 
correlat^L>n8 o(v). Then for a ■ 1, 2, ... one forma 

f,*") • ^ • 


where 

g^( 2 ) ■ 1 + a^(l) * + *■ a^(m)z® ; 

Che sample autoregreaalvs coefflcienta a (J) satisfy 
the sample Yule-Walker equations * 

m 

I i (k) p(k-J) - 0. J - I.a , 

k-0 * 

where a^<0) • 1; and cha ordar m innovation variance 
m 

■ t a (k) p(k) . 

■ k-O ■ 

Define <3^ by 

log m I log ?(w) dw . 

0 

Then as m tenda to T, 


f.^(t) • Y(t) a^(l) Y(t-l) ♦...s- .i^(«)Y(t-m) 


2I(f ;?) • log - log 

* ■ • 


0 


I 





Ue d«alre to be a sequence of roneletent escl- 
nators of f tn the sense chat il one chooses m as .1 
suitable function of T, then as T •• • 

Iff ;f) - 0 and f fw) - ffw) , 
o a 

in probability, or with probability one, or in oean 
square. The firac rigorous proof of such results was 
given by Berk (1974) who also flnda the asymptotic 
variance of fg(w),confirming conjectures in Parren(1969. 

We now consider the problem of choosing m adap¬ 
tively from a sample of alac T. Conceptually one 
would like to choose m to olnlmlae I(^^;f). One 
approach to such a procedure is given by Akaike (1974) 
and leads to an order determining criterion called AlC. 

The Akaike Information criterion computes tor 
m • 1,2. ... 

AIC(«) - log Y 

and an optimal order m satisfying 

AlC(m) " min AZC(ffl) 
m 

The optimal order m can equal 0, Indicating that 
the time series is white noise. The value of AIC(O) 
can be adjusted to the value one desires for the pro¬ 
bability of rejecting the hypothesis of white noise, 
when In fact the time series is white noise. We 
recommend 

AIC(O) • - i . 

Parzen (1974), (1977) proposes autoregressive 
order determining criteria, called CAT, whose founda¬ 
tions are different from those of AlC but which usually 
lead to exactly equivalent orders in practice. The 
time aeries aodcl idencificaclon problem lo to esci- 
mace the infinite auCoregreesive transfer function 


.tn estimator of Che terms In ) which depend on 
thus one Alnlmi/es 


is an “unbiased'* estioacor of 'i~. At m ■ 0, we 
assign CAT(O) - - (I + (1/T)). ^ 

It should be noted that a multiple cine series 
version of CAT Is given tn Parzen (1977). 

An order determining criterion which Is consis¬ 
tent. but whose behavior in pr.ictlce Is controversial. 
Is given by Hannan and i)uinn ^1979). 


boa Spectral Kernel bsciaator and Cepstral 
Correlations 


The approach we have been deecriblng for forming 
“opclAar* estimators f(w) of the spectral density f(w) 
of a stationary time aeries is to view f(w) ee a 
function closest to ^(w) In s distance between spec¬ 
tral densities given by the Information divergence 
t(f;f). The class of functions from which i(y) is 
chosen has been constrained (or specified) parametri¬ 
cally, In the sense that fCw) is of the form fsfw), 
where 0 eatlmaces the parameters 9 of a model i^Cv) 
for Che true f(w). 

A non-paraouicric constraint is to impoae a 
smoothness measure on C such as the integral square 
of Che r-ch derivative of log f(w), denoted 

/ 1(log f(w>)1^ dw . 

0 


g^(z) •• 1 + a^(l) t +■ ... + * ••• 

by a sample order m autoregressive transfer function 
V*)- To evaluste the overall mean square error it 
le cooveoienc to define 

I ^ ^ g,(«^’'‘'')|2f(w)4v 

0 ■ • 

which can be shown to be the 3\m of a variance term 

^ ^ I ^ <<w)<iw 

0 m m 


and a bias term 



^ t<w)dw 


One then seeks Co choose f to maximize smoothness, 
while minimizing a meeaure of distance of f from f. 
Wahbe (1980) Introduces the esCimarion distance 

1 . , 1 , . 

/ iTiQg f(w) - log f(w)l^dvHl/ lllog fCw))^*^M^<iw 
0 0 

where K la a penally parameter to be determined adap¬ 
tively by Che data. One may show that the resulting 
estimators of g(w) • log f(v) are of the form, called 
log spectral kernel estimators, 

g(w) - (log f(w»- Z •xp(-2Hlwv>k(g) f(v) 

V* 

where 

1 

y(v) - / exp (2vlwv) log f(w) dw 
0 


One can show t)iat the variance term la approximately 


1 

T 


r 

J*i 


o 


-2 


are called cepatral correlatlona. and the kernel k<x> 
le given by ^compare Perzen (1958)) 




One often conatdere only two values for r, 2 and 4. 

Tha statiaclcal propereiee of cepecral correla¬ 
tion have been extensively inveetigaced by Bhanaall 
a97t). 

Stnem k(^) - j for v - H, w c*ll M ch, "half 

povar" l«g. W* ,nlt to «dftptiv«ly dotanln, M from 
th, ,wi,l« to ■Inlali, tho ri,lt functtoo 

11^ - J(f;f) - E LjL(f,f) 


issualng log f(w) lias i roprt*:!ientatio(t 

glw) - log f(w) • exp(-2Tlw) >(v> . 

Following Wahba (L980), to alninlze on«! miniaizefl 
an esciAator of It of the fora 

^ - B(M) + V(H,T) 

where B(M) and V(M,T) are meaaureii of blaa and varl- 
ance given by 

B(«l - ir - (y(v))^v'*''(l + . 

V(M,T) - ? 7^ 4 r O’- <iu . 

' “ 0 

A closed fora evaluation of the Integral in V<M,T) 
can be obtained. 

S. [terated spectral estimation . 

ObSs^rved tioe seriea do not usually obey the 
aaauaptions aade In the foregoing theory chat Y(t) is 
a zero seen Gaussian dee series with aienable corre¬ 
lation function. We call such a dee series a '*3hort 
aaeocy” dee serree (of which white noise is a special 
caae. called a "no eeeory" time series). Otherwise 
the doe series Is called "long mesury" (Parren(1962)X 
Autoregressive spectral estimacors are especial¬ 
ly tuicabic for matching the large scale oacilladons 
of the spectral density of a long oeBory time series. 
The role of the autoregressive filter Is then to 
transform the time series to a short memory dac 
series [obtained as Che residuals ''•^(c) described In 
section 21. The spectral density of the short memory 
seri'i;^* which can be regarded as Che fine structure 
of Che orlglnel spectral density, can be esclmsted by 
a log spectral smooching estlmscor as well as by an 
autoregressive spectral estimator. Employing two 
different approaches to short memory spectral escims- 
don la desirable since the problem of spectral 
esclmsdon is not simply a problem of parameter 
escimscion but is also one of model Idendficetion. 

itersced models for forecasting Long memory time 
scries are used by Parzen (1961) under the name of 
"ARAJWA modela" (sec Appendix for an example). 
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APPENDIX: Wolfef Sunspot Numbers 1846-1963 . 

To illustrsce the application of some of the 
foregoing ideas, we report an iterated autoregressive 
model fitted to the annual time series Y(c) of Uolfer's 
sunspot data for the years 1846-1963 (which Is a sample 
of length T * 118). Our ARARMA model fitting algo¬ 
rithm automeclcelly proposes the following model (which 
it hopes will have the best medium range, If not Long 
range, forecasting capability): 

Y(t) - Y(t) - .482 Y(t-lO) - .554 Y<t-U) 

Vft) - 1.009 y(t-l) > .362 y(e-2) • t(t) 

The series Y<e; is a short asmory time ssrles to which 
Y(t) has beau transformed by the initial autoregress¬ 
ion on Y(c). As an sstlmstor of the true log spectral 
density f(w) ve take, up to a normalizing constant. 

\lo% c^iw) f • tlog f^iw)-' log f„,(w) 


where 

' s ‘ 

la ths sutoregrsssive spectral density corresponding 
to Che transformation from Y(t) to Y(c). 

Figurs 1 Is s graph of the Uolfer sunspot data 
(the crossss represent the one-step ahead predictors 
of Che model above). Figure 2 graphs the Iterated 
autoregressive log spectre! esclaator Uog f^fw))' 
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